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Abstract 

A hierarchy set of transcriptional regulators controls the development of multicellular 
organisms by sequential activation by transcription factors such as OVO-like proteins 
(belongs a zinc finger family). OVO-like genes of mammals are considered as orthologs 
of OVO gene from Drosophila melanogaster until now. Using comparative bioinformatic 
analysis of OVO-like genes of vertebrates, we have identified three different types of 
OVO-like genes OVOL1-OVOL3 in vertebrates and a paralog of OVOL3 - OVOL4 is 
detected in fish lineages. Upon comparing the domains of these OVO-like proteins from 
different metazoa genomes, we found that there is basal domain consisting tetrad of 
C2H2 zinc fingers to which by N-/ C-extensions various types of OVO-like 
genes/proteins are evolved in different lineages of metazoan where size of extensions 
varied from hundreds to several hundreds of amino acids and these extensions do not 
share homologies with OVO-like genes from placozoans to mammals. By corroborating 
the full length domains of OVO-like proteins, it is clear that human OVO-like proteins 
OVOL1-OVOL3 are merely homo logs of Drosophila OVO, but not ortholog as until now 
described in databases. 
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Introduction 



Transcriptional regulatory activities are assisted by transcription factors such as a family 
of zinc finger proteins possessing zinc finger motifs in their core domain. OVO-like 
proteins function as transcription factors to regulate gene expression in various 
differentiation processes [1-3]. Drosophila ovo is a prototype of ovo genes, which is most 
extensively characterized [4-5], with multiple spliced isoforms encode at least four 
protein isoforms (A-D), all containing four identical Cys2/His2 (C2H2) zinc fingers at C- 
terminal end while isoforms differ at N-terminal end. A homolog of OVO-like gene, lin- 
48 from Caenorhabditis elegans encodes a C2H2 zinc-finger protein similar to the 
product of the Drosophila ovo gene [2]. OVO-like proteins primarily act as either 
transcriptional activators or transcriptional repressors [3,6-9], a typical function of zinc 
finger motif carrying family of proteins. For the purpose functional studies, OVO like 
genes are primarily reported in human and in mouse such as OVOL1, OVOL2 and 
OVOL3. Various functional studies in selected model organisms (such D. melanogaster, 
C. elegans and mice) corroborated that OVO genes are involved in the development and 
differentiation of a number of epithelial lineages [2,6,10-16]. Ovoll is auto-repressor of 
expression by counteracting c-Myb activation and histone acetylation of its own promoter 
[11]. Ovol2 is identified as a key regulator of neural development in mice [13]. To date, 
there is no functional study reported for mammalian OVOL3. 

Moreover, very little is known about their molecular evolution of OVO like genes in 
vertebrate lineage until now. Herein, we unravel molecular evolutionary insights of 
vertebrate OVO like genes due to two main reasons. First, there is no information yet 
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about functional roles of OVO like genes from non-mammalian vertebrates. Second, it is 
expected that protein families diverged in fishes for genetic and functional novelties due 
to whole genome duplication which occurred in fish lineage after its separation from 
fishes [17-18] and it is interesting quest to find out what genomic novelties are associated 
with OVO like proteins in fishes. Recently, the genomes of a variety of organisms have 
been completely or nearly completely sequenced, facilitating the identification of OVO 
like genes in different vertebrate species. 

In present work, using searches based on the conservation of nucleotides, genomic 
fragments and amino acidic sequences and domain architecture, we have identified in 
silico the putative OVO like genes of the following vertebrates: five teleost species 
(Tetraodon nigroviridis - Tetraodon [19], Takifugu rubripes - Fugu [20], Oryzias latipes 
- medaka [21], Gasterosteus aculeatus -stickleback and Danio rerio - Zebrafish), one 
amphibian species {Xenopus tropicalis - western clawed frog [22]), three avian species 
{Gallus gallus - chicken [23], Taeniopygia guttata - zebra finch [24] and Meleagris 
gallopavo - turkey), one reptile species (Anolis carolinensis - Anole lizard) and four 
mammalian representatives {Homo sapiens - human [25], Mus musculus - mouse [26], 
Rattus norvegicus - rat [27] and Monodelphis domestica - Opossum). Furthermore, we 
extended our analysis in different metazoan genomes such as lancelets - Branchiostoma 
floridae [28], sea urchin - Strongylocentrotus purpuratus [29], flies -Drosophila 
melanogaster [30], worms - Caenorhabditis elegans [31], annelids - Helobdella robusta, 
and molluscs - Lottia gigantea. The information/noise ratio in protein sequence 
alignments is better in compared to alignments of DNA because of the fact that the 
proteins are built from a repertoire of twenty variables - amino acids while DNA only 
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contains four different bases [32-33]. Putative homologs of vertebrate OVOL genes from 
public genome database were recognized by homology searching tools such BLAST suite 
[34-36] or FASTA suite [37-38]. Orthologies were confirmed by in silico comparative 
genomic methods, such as conserved gene synteny between different vertebrate species 
teleosts vs. amphibians vs. birds vs. mammals, phylogenetic analysis with orthologues 
from different species, and functional mapping of orthologues, using a set of amino acid 
residues known to serve critical roles in functions of the proteins. Comparative analyses 
using different vertebrate species were used to study the evolution of OVOLs. 
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Results 

Orthology assessment of OVOL1 in vertebrates 

Conservation of genomic organization of a gene on a given chromosome or scaffold 
provides a useful tool for predicting functional interaction. Selective processes are 
essential to preserve the organization of these clusters in closely related organisms. Thus, 
chromosomal localization and gene order conservation is a vital tool in assigning 
orthology for a given set of genes in a gene family. We compared the syntenic 
organization of OVOL1 genes from different vertebrates. OVOL1 gene is localized on 
chromosome 11 in human genome as shown in Figure 1. There is set of genes flanking 
OVOL1 gene in both side such as triad of SIPA1-RELA-KAT5 on the one side and a set 
of five genes namely SNX32-MUS81-RIBP-FOSL1-BANF1 on the other side in a region 
of -380 kb. This syntenic architecture is maintained across several mammals such as in 
mouse (chromosome 19/400 kb fragment), in rat (chromosome 1/300 kb fragment), and 
in opossum (chromosome 8/300 kb fragment). However, when this fragment is searched 
into bird genomes such as chicken, turkey, and zebra finch, we do not find OVOL1 genes 
at all. Additionally, complete genomic locus comparable to mammalian genome fragment 
is also missing in these genomes (Figure 1). 
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Mammals: Birds & reptiles: Amphibia: Fishes 

Human (Chr. 11) Chicken Xenopus (Scaffold_474) Fugu (Scaffold_98) 

Mouse Rat (Chr. 19) Turkey Tetraodon (Chr. Un_ran) 

Rat(Chr. 1) Zebrafinch Stickleback (Group vn) 

Opossum (Chr. 8) Anole lizard Zebrafish (Chr. 7) 



Figure 1. Syntenic analysis of 0V0L1 gene from selected vertebrates, flanked by a set of 
conserved marker genes. 

Subsequently, we found same results as of birds in only sequenced reptile genome - 
Anole lizard, A. carolinensis (AcoCarl.O assembly). However, we traced OVOL1 gene in 
the frog X. tropicalis genome with identical locus as of mammals on scaffold_474 in a 
900 kb fragment. We identified OVOL1 genes in different fish genomes flanking a triad 
of DYSF-ECOC6B-DAK on one side and MUS81-COL4A5 on the other side in a 
fragment of 300 kb, 340 kb, 350kb and 380 kb and in Fugu (scaffold_98), Tetraodon 
(chromosome Un random), stickleback (groupVII) and zebrafish (chromosome 7), 
respectively. Although the sets of marker genes are largely varied in tetrapod and fishes, 
however, presence of a single copy of highly conserved gene - MUS81 (encodes for 611 
amino acid long Crossover junction endonuclease MUS81) in all genomes of vertebrates, 
concludes that these OVOL1 locus is orthologically conserved from fish to human. 
OVOL1 orthologs from different vertebrates are listed in Table 1. 
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Orthology assignment of OVOL2 in vertebrates 

Upon tracing OVOL2 orthologs in different vertebrate genomes, we found that OVOL2 
gene is localized on chromosome 20 in the human genome flanking a set of marker genes 
such as a triad of RRBP1-BANF2-SNX5 on one side and a set of five genes - RP2BP- 
POLR3F-RBBP5-SEC23B-DTD1 on the other side in a region of -900 kb (Figure 2). 
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Mammals: Birds & reptiles: 

Human (Chr. 20) Chicken (Chr. 3) 

Mouse Rat (Chr. 2) Turkey (Chr. 2) 

Rat (Chr. 3) Zebrafinch (Chr. 3) 

Opossum (Chr. 1 ) Anole lizard (Scaffold_366) 




Amphibia: 

Xenopus (Scaffold_61 4} 



Fishes: 

Fugu 
Tetraodon 
Stickleback 
Medaka 
Danio 



Figure 2. Vertebrate OVOL2 orthologs identified by comparing chromosomal localization of this gene from 
evolutionary important vertebrates. 



This genomic fragment is maintained in large sets of mammals such as in mouse 
(chromosome 2/-600 kb), in rat (chromosome 3/600 kb fragment) and in opossum 
(chromosome 1/1.3 Mb fragment). By comparing this genomic architecture in bird 
genomes, we detected a similar fragment as reported above for mammalian OVOL2 gene 
for OVOL2 gene from birds with fragment size of 220 kb, 200 kb and 200 kb in chicken 
(chromosome 2), zebra finch (chromosome 3), and turkey (chromosome 2) , respectively. 
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Furthermore, we have identified this syntenic organization in anole lizard on 
scaffold_366 in a region of about 200 kb. However, we could not detect OVOL2 gene in 
fish genomes although we could find marker genes scattered on different locus. This 
corroborates that OVOL2 gene is missing in five analyzed fish genomes. OVOL2 
orthologs from different vertebrates are listed in Table 2. 

Unraveling human OVOL3 orthologs in different vertebrates 

While tracing the OVO like genes, we identified third gene OVOL3 in wide array of 
mammals such as human (chromosome 19), chimpanzee (chromosome 19), mouse 
(chromosome 7), rat (chromosome 1), cow (chromosome 18), pig (chromosome 6), and 
opossum (chromosome 4) with a conserved synteny. In this conserved synteny, OVOL3 
gene is flanked by an octet of LIN3 7-PRODH2-KIRREL2-APLP 1 1 -NKF3ID-LPFN3 - 
SDHAF1-CLIF3 on one side and a triad of POLR2L-CAPSN1-COX7A1 on the other 
side in a region of -400 kb in human chromosome 19 (Figure 3). 

In fishes, this genomic arrangement is not found, however, there is another genomic 
organization on which OVO like gene is localized, which we named as OVOL4. OVOL4 
is flanked by a tetrad of AMOT-HLCS-REX02-DMPK and by AKT2-2 on the other side 
on scaffold_455 in Fugu. Similar architecture is maintained in zebrafish (chromosome 
10) and in medaka (chromosome 14). On reverse searching fish marker genes in 
mammalian genomes, we found that AKT2 gene that encode for an enzyme RAC-beta 
serine/threonine -protein kinase and this gene has two copies in fishes as AKT2-1 and 
AKT2-2. We detected AKT2-1 is found on close to mammalian cluster of OVOL3, 
suggesting that OVOL4 fishes is paralogous to OVOL3 of mammals. OVOL3 orthologs 

9 



from different vertebrates are listed in Table 3 and OVOL4 genes of fishes are listed in 



table 4. 




fUW \ Wl II. I OJ 

Pig(Chr. 6) Opossum (Chr.4) 



Figure 3. Syntenic conservation of 0V0L3 gene and its paralog 0V0L4 from selected vertebrates. 

On closely inspecting chromosomal localizations of OVOL1-OVOL3 from vertebrates, 
we found that syntenies of OVOL1 and OVOL2 share marker genes that homologous 
marker genes such as Barrier-to-autointegration factor encoding genes - BANF1 and 
BANF2 (marked in blue in Figure 1 and Figure 2), respectively and sorting nexin 
homologs - SNX32 and SNX5 (marked in yellow in Figure 1 and Figure 2)„ 
respectively. This indicates that OVOL1 and OVOL2 are originated by fragmental 
duplications before 450 MY ago since it is maintained from fish to mammals. 
Surprisingly, birds have only one copy of OVO like genes - OVOL2, indicating either 
there is bird specific adaptations do not require second copy of OVO like genes or OVO 
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like genes have similar physiological roles which can be performed by any of three OVO 
like genes. 

Evolutionary history of OVO like domains in metazoan 

Upon protein sequence analysis of OVO like (OVOL) proteins from vertebrates, we 
found that OVOL1, OVOL2, OVOL3 and OVOL4 share a common domain with highly 
conserved tetrad of C2H2 type zinc finger motif in C-terminal region. The peptide length 
of OVO like (OVOL) proteins varied from 215-286 amino acids. OVOL1 has larger 
peptide length and OVOL3 and OVOL4 has smaller peptide length due to variability in 
N- and C- terminal extension. Secondary structure of human OVOL1 is predicted using 
PSIPRED [39] as depicted in Figure 4. 




11 



Figure 4. Alignment of OVO like proteins from different vertebrates, B. floridae and N. vectensis. This alignment 
is created by MUSCLE [64-65] and further edited in GeneDoc [66]. Secondary structures of human OVOL1 is predicted 
using PSIPRED [39], and are marked above the alignment. Four C2H2 zinc finger motifs ( I - 1 V) are marked by orange 
bar. The rodent OVOL3 protein terminates at position 10 in C2H2 motif IV. 



There are thirteen a-helices and eleven P-sheets are present in human OVOL1 protein. 
There are small stretches of disordered regions in first 100 amino acids from N-terminal 
regions of OVOL1-OVOL3 proteins (Figure 5A-5C) as predicted by DISOPRED2 
software [40]. 



I 
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Figure 5. Presence of disordered regions in different OVO like proteins. 

A. Mouse OVOL1 has disordered residues in first 100 amino acids. 

B. Mouse OVOL2 possesses disordered residues in first 50 amino acids with a glycine-rich and Serine rich region as 
marked in red color. 

C. Mouse OVOL3 has disordered segments in N-terminal 100 residues. 

D. Drosophila OVO is intrinsically disordered with large patches of residue biasness as indicated by red color. 

This prediction is obtained by DISOPRED2 software [40]. The horizontal line is the order/disorder threshold for the 
default false positive rate of 5%. The 'filter' curve represents the outputs from DISOPRED2 and the 'output' curve 
the outputs from a linear SVM classifier (DISOPREDsvm). The outputs from DISOPREDsvm are included to 
indicate shorter as low confidence predictions of disorder. 
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There are four C2H2 type zinc finger motifs in mammalian OVO-like proteins in 
following regions 118 - 140 (23 aa), 146-168 (23 aa), 174-197 (24 aa), and 213-236 (24), 
numbering according to human OVOL1. This domain is highly conserved in different 
OVO like proteins from wide array of vertebrates. Normally Drosophila OVO is 
considered ortholog of human or mouse OVO like genes, However, there is a word of 
caution in such cases as Drosophila OVO gene encodes for four alternatively spliced 
isoforms names as A-D. Drosophila OVO-B isoform encodes for full length peptide 
length of 1351 amino acids, whereas size of isoforms A, C and D are 975, 1222 and 1028 
amino acids, respectively. Drosophila OVO has four C2H2 zinc finger motifs at 
following positions - 1197 - 1219 (23 aa), 1225 - 1247 (23 aa), 1253 - 1276 (24 aa), 
and 1292 - 1315 (24 aa), respectively at C-terminal ends. Furthermore, The full-length 
Drosophila OVO protein is further characterized by presence of amino acid biasness or 
low complexity regions which is called as disordered regions as predicted by 
DISOPRED2 software [40]. Drosophila OVO protein appeared to be intrinsically 
disordered (Figure 5D) with following biasness for specific amino acids - a Glu-rich 
region between 196 - 239 (44 aa), a Pro-rich between 309 - 342 (34 aa), a Gly-rich 
between 448 - 618 (171 aa), an Asn-rich between 620 - 660 (41 aa), a His-rich region 
between 645 - 65 (39 aa) , Gin-rich region between 837 - 1158 (322 aa), and an Ala-rich 
region between 1001 - 1059 (59 aa) and a Ser-rich region between 1025 - 1045 (21 aa), 
respectively. Vertebrate OVO-like proteins and Drosophila OVO share strong 
conservation in C-terminal end possessing tetrad of C2H2 zinc finger motifs. 

To further delineating OVO like proteins from different metazoan, we have traced OVO 
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like genes in a set of metazoa genomes using BLAST suite [34-36]. Based on homology 
searches for human OVO like genes, we identified two genes from Branchiostoma 
floridae genome sharing high degree of conservation with JGI accession id 
e_gw. 374. 48.1 and e_gw.236.92.1. Furthermore, on searching these genes in sea urchin - 
Nematostella vectensis genome, we identified two genes that possess higher conservation 
with human OVO domains with JGI accession id gw. 3 1.97.1 and e_gw. 3 1.122.1, 
respectively. OVO like proteins these two species are comprised of a single domain with 
tetrad of C2H2 zinc finger motif as possessed by vertebrate OVO like proteins (Figure 6). 
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Figure 6. Protein Alignment of 0V0 like proteins from mouse, lancelets and sea anemone depicting 
conservation in C-terminal end possessing tetrad of C2H2 zinc finger motifs (marked by red bar). This 
alignment is created by MUSCLE [64-65] and further edited by Jalview [67-68]. 



To understand the molecular evolution of OVOL like proteins, a phylogenetic tree 
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(Figure 7) of OVO like proteins was constructed using Neighbor- Joining method [41] 
with bootstrap value =1000 replicates [42] with help of MEGA4 software [43]. This 
phylogenetic tree demarked the different types of OVO like orthologs such as OVOL1, 
OVOL2 and OVOL3-OVOL4 along with basal B. floridae and N. vectensis OVO like 
proteins named as O VOBFL 1 / O VOBFL2 and OVONVE1/OVONVE2, respectively. 
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Figure 7. Evolutionary history of vertebrate OVO like proteins using Neighbor-Joining method [41] with 
bootstrap value =1000 replicates. Branches corresponding to partitions reproduced in less than 35% bootstrap 
replicates are collapsed. The percentage of replicate trees in which the OVO like proteins clustered together in the 
bootstrap test (1000 replicates) are shown next to the branches [42]. Phylogenetic analyses were conducted in MEGA4 
[43]. Different colors indicates various types of OVO like genes such as OVOL1-OVOL3 orthologs and paralog of 
OVOL3 in fishes named as OVOL4. Furthermore, OVO like genes from lancelets and sea anemone are also clustered 
in the tree separately. 
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On further searching human OVO like proteins in the leech H. robusta genome, we 
identified, OVO like proteins in this annelid genome with accession id e gw 1.1. 189 1.1 
and egwl. 4. 1162.1, respectively. In JGI Genome L. gigantea vl.O, 
fgenesh2_pg.C_sca_l 3000299 & e_gwl.l3.34.1 are found as OVO like genes upon 
homology searches. Two OVO like proteins were detected from sea urchin - S. 
purpuratus genome with accession id XP 796652.2 and XP 788 176.1. These two 
proteins from S. purpuratus share conserved C2H2 zinc finger carrying domain, in 
addition to non-homologous N-terminal extensions. On comparing four C2H2 type zinc 
finger motifs from sea anemone to vertebrates, we found that these four motifs are 
conserved (Figure 8), and these are typical C2H2 type zinc finger motifs as reported 
description of these motifs in Prosite pattern database with accession number PS00028. 



I 
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Figure 8. Sequence logo of four different Cys2-His2 (C2H2) zinc finger motifs (l-IV) present in different OVO 
like proteins from sea anemone to human. This sequence logo is generated from comprehensive protein alignment 
of OVO-like proteins (supplementary figure S1) using WebLogo 3.0 [69]. C2H2 zinc finger motif IV has 25 amino 
acids due to presence of one extra amino acid OVOLNVE1 protein from sea anemone, which is at eleventh position in 
sequence logo of C2H2 zinc finger motif. 
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Figure 9 depicts domain evolution of OVO like proteins from different lineage of 
metazoan over period of > 700 million years. 
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Figure 9. Protein domain evolution of OVO like proteins from different lineage of metazoan over period of > 
700 million years. A highly conserved domain of tetrad C2H2 zinc finger motifs (red and yellow box) is found in all 
evolutionary important organisms. Primarily N-terminal extensions in C2H2 lead to different types of protein products 
with exception in jgi|Helro1|63219 in leech H. robusta where extension is found in C-terminal end of C2H2 zinc finger 
motif. The time period is marked with help of works of Kumar and Hedge (2003) [70] and Ponting (2008) [71]. 

The basal metazoan such sea anemone possesses only C2H2 zinc finger carrying OVO 

like domain, to which by N-/C-terminal extensions lead into different types of OVO like 

proteins, predominantly these extensions are N-terminal with exception of 

jgi|Helrol|63219 in leech//, robusta where extension is found in C-terminal end of C2H2 

zinc finger motif. The extension peptides varied from hundred to several hundreds of 

amino acids such vertebrate OVO like proteins and LIN48 from C. elegans have 100-120 
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amino acid extension with disordered region (Figure 10). Whereas Drosophila OVO and 
S. purpuratus OVO like proteins has several hundreds of amino acid extension in N- 
terminal end. The extended amino acid regions do not share homology with other OVO 
like proteins from evolutionary distant organism. This corroborates that these proteins 
from different lineages are actually only "homologs" not ortholog of Drosophila OVO 
proteins as described in annotations of different databases at the moment, since it is 
expected that full length domains of orthologous proteins are conserved. 
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Figure 10. LIN48 protein from C. elegans is characterized by presence of disordered segment of amino acids in 
first 100 residues as predicted by DISOPRED2 software [40]. Further details in Figure 4. 



Discussion 



This study provides for the first time a comprehensive description of gene sequences, 
structural inputs, and detailed molecular phylogenetic studies of vertebrate OVO like 
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genes. We have identified orthologs human 0V0 like genes from fish to mammals over 
450 MY. Birds have only one copy of OVO like gene - OV02 where as fishes have two 
genes OVOL1 and paralog of OVOL3 - named here as OVOL4. Presence of homologous 
marker genes in close vicinity of OVOL1 and OVOL2 loci correlates that OVOL1 and 
OVOL2 are originated by fragmental duplications. These fragments are maintained from 
fish to human, thus these are duplicated before separation of fishes from tetrapod lineage 
about 450 MY ago. The differential presence of OVO like genes in birds, reptiles and fish 
lineages, raises points to their role in development and differentiation of a number of 
epithelial cells in animal specific requirements such birds needed to reduce their body 
mass, thus do not need as many as mammalian genes. A regulatory mechanism is applied 
to development of epithelial cells and tissues as different organisms have adopted a 
different role such as flying and/or swimming under water. 

Prior to this work, Drosophila OVO was considered to be ortholog of mammalian OVO 
like genes in different databases such as Ensembl V58 [44-45], but due to difference 
peptide compositions and peptide length it is not possible to assign orthology of 
Drosophila OVO in different metazoan lineages. However, OVO like proteins from 
different metazoan lineages, possess a conserved domain consists of four C2H2 zinc 
finger motifs to which by predominantly by N-terminal extensions with disordered 
segments, complete domain of varied OVO like proteins are originated over period of 
700 MY. Only one case of C-terminal extension is found in H. robusta. These extended 
regions in different OVO like proteins do not share significant homology. This extension 
of OVO like proteins are primarily consists of disordered residues, constituting a non- 
foldable domain with patches of multiple occurrence of same residue. In post-genomic 
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era, there are several eukaryotic genomic sequences are available and from encoded 
proteins from these eukaryotes, it is evident that these disordered proteins are surprisingly 
common and these are frequently found in different eukaryotes. Furthermore with advent 
of genomic sequencing technologies and experimental methods, involvement of these 
domains become clear that these domains are found in many functional proteins [46-49] 
such as regulatory processes - cell signaling proteins [50-52] and transcriptional 
regulators [53]. 

These disordered segments have variable size from few amino acid sequences to entire 
domain (of several hundred amino acids), to even the entire protein as big as -200 kDa 
[54]. Since, OVO like proteins are also transcriptional regulators, such disordered 
segments of varied length are also feature of such a regulatory system. Several of known 
disordered proteins or segments significantly differ in terms of sequences homology [54]. 
Thus, extended segments of OVO like proteins fall into same club of non-homologous 
disordered protein segments or domains. 

This study enhances present understanding of OVOL genes from metazoan genomes. 
Furthermore, this study provides a good platform for those who are interested in 
characterizing OVO like genes from diverged species and also in vertebrate model 
systems such as Xenopus, Gallus and Danio. 
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Materials and Methods 

Data sources of genomic, cDNA and protein sequences 

The genomic DNA/cDNA/protein sequences from different eukaryotes were extracted 
via BLAST suite [55] searches using human/mouse OVOL1 as query sequence from the 
different genome databases such as human, mouse, rat and zebrafish genomes from 
National Centre for Biotechnology Information (NCBI) [56], Takifugu , Xenopus 
tropicalis and Branchiostoma floridae vl.O, Helobdella robusta vl.O, Nematostella 
vectensis vl.O and Lottia gigantea vl.O from the DOE Joint Genome Institute (JGI) [57], 
Tetraodon nigroviridis from Ensembl [44-45] and the French National Sequencing 
Center (Genoscope) [58], and Strongylocentrotus purpuratus genome at the Human 
Genome Sequencing Center (HGSC), Baylor College of Medicine [59]. 

Micro-synteny analysis across different genomes 

To verify the orthology, micro-synteny across different genomes were analyzed using 
NCBI mapviewer [60], ENSEMBL genome browser [44-45], JGI genome browser [57], 
Tetraodon genome browser at the Genoscope [58] and UCSC genome browser [61]. 

Sequence alignment of different OVO like proteins 

Protein alignments of different OVO like proteins were generated with CLUSTALX 1.83 
[62-63] or MUSCLE [64-65]. The alignments were edited and visualized different 
sequence characteristics using GENEDOC [66] or JALVIEW [67-68]. 
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Table 1. List of OVOL1 genes from selected vertebrate genomes, identified from Ensembl database V58 (May 2010). 



Organism 


Gene Id 


Genomic Location 


Length Protein Id 
(bp) 


Length (aa) 






OVOL1 








Human 


ENSG00000172818 


Chromosome 1 1 


2991 


ENSP00000337862 


267 


Mouse 

III VUWV/ 


ENSMUSG00000024922 


Chromosome 19 


2900 


EN SM U S P0000002586 1 


267 


Rat 


ENSRNOG00000020669 


Chromosome 1 


2993 


ENSRNOP00000028081 


267 


Opossum 


ENSMODG00000009534 


Chromosome 8 


810 


ENSMODP00000032966 


269 


Frog 


ENSXETG00000020587 


scaffold_474 


807 


ENSXETP00000044473 


268 


Fugu 


ENSTRUG00000011790 


scaffold_98 


961 


ENSTRUP00000029762 


286 


Stickleback 


ENSGACG00000018794 


groupVII 


1262 


ENSGACP00000024844 


274 


Tetraodon 


ENSTNIG00000005499 


Chromosome Un_random 


846 


ENSTNIP00000008194 


281 


Zebrafish 


ENSDARG00000079995 


Chromosome 7 


960 


ENSDARP00000102900 


256 



Table 2. List of OVOL2 genes from selected vertebrate genomes, identified from Ensembl database V58 (May 2010). 



Organisms 


Scientific Name 


Gene ID 


Chromosomal location 


Transcript ID 


Length 
(bp) 


Protein ID 


Length 
(aa) 


Uiim'in 

n uiiidii 


nUlliU oafJIVIlo 




OlIlUMIUoUMIc L\j 


FN ^700000978780 


I HHZJ 


FN C ;P0000097S7S0 


97^ 


IVI UUoC 


Mas miisriiliis 

IVIUO 1 1 IUOKjUIUO 


fnsmusgoooooo37979 


\J\ II \J\ 1 lUOUl I IG L. 


FNSMUST00000037493 

I_ 1 N O IVI U O 1 \J\J\J\J\J\J\J 1 ^L-\J 


1539 


FNSMUSP00000044096 


274 


Rat 


Rattus norvegicus 


ENSRNOG00000006850 


Chromosome 3 


ENSRNOT00000009226 


1263 


ENSRNOP00000009226 


274 


Opossum 


Monodelphis domestica 


ENSMODG00000005504 


Chromosome 1 


ENSMODT00000006948 


798 


ENSMODP00000006810 


265 


Chicken 


Gallus gallus 


ENSGALG00000008702 


Chromosome 3 


ENSGALT00000014161 


816 


ENSGALP00000014145 


262 


Zebrafinch 


Taeniopygia guttata 


ENSTGUG00000005909 


Chromosome 3 


ENSTGUT00000006128 


789 


ENSTGUP00000006069 


263 


Turkey 


Meleagris gallopavo 


ENSMGAG00000006770 


Chromosome 2 


ENSMGAT00000007574 


504 


ENSMGAP00000006816 


167 


Anole Lizard 


Anolis carolinensis 


ENSACAG00000014216 


scaffold_366 


ENSACAT00000014246 


780 


ENSACAP000000 13960 


260 


Frog 


Xenopus tropicalis 


ENSXETG00000024897 


scaffold_614 


ENSXETT00000053523 


2724 


ENSXETP00000053523 


287 



Table 3. List of OVOL3 genes from selected mammalian genomes, identified from Ensembl database V58 (May 2010). 



Organisms 


Scientific Name 


Gene ID 


Chromosomal 
location 


Transcript ID 


Length 
(bp) 


Protein ID 


Length 
(aa) 


Human 


Homo sapiens 


ENSG000001 05261 


Chromosome 19 


ENST00000262637 


558 


ENSP00000262637 


185 


Chimpanzee 


Pan troglodytes 


ENSPTRG00000010884 


Chromosome 19 


ENSPTRT00000020163 


654 


ENSPTRP00000018647 


217 


Mouse 


Mus musculus 


ENSMUSG00000056028 


Chromosome 7 


ENSMUST00000047308 


821 


ENSMUSP00000045372 


220 


Rat 


Rattus norvegicus 


ENSRNOG00000024880 


Chromosome 1 


ENSRNOT00000041301 


615 


ENSRNOP00000041919 


205 


Cow 


Bos taurus 


EN SBTAG000000 15001 


Chromosome 18 


ENSBTAG00000015001 


666 


ENSBTAP00000032029 


222 


Pig 


Sus scrofa 


ENSSSCG00000002924 


Chromosome 19 


ENSSSCT00000003230 


636 


ENSSSCP00000003149 


212 


Opossum 


Monodelphis domestica 


ENSMODG00000009534 


Chromosome 8 


ENSMODT00000034545 


810 


ENSMODP00000032966 


269 



Table 4. List of OVOL4 genes from fish genomes, identified from Ensembl database V58 (May 2010). 



Organism 


Scientific Name 


Gene Id 


Genomic Location 


Transcript Id 


Length 
(bp) 


Protein Id 


Length 
(aa) 


Fugu 


Takifugu rubripes 


ENSTRUG00000008625 


scaffold_455 


ENSTRUT00000021725 


393* 


ENSTRUP00000021637 


131 


Medaka 


Oryzias latipes 


ENSORLG00000005380 


Chromosome 14 


ENSORLT00000006784 


654 


ENSORLP00000006783 


218 


Zebrafish 


Danio rerio 


ENSDARG00000079995 


Chromosome 10 


ENSDART00000108918 


960 


ENSDARP00000099960 


253 



*Partial sequences 



